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ABSTRACT 


A Two-Dimensional Fast Recursive Least Squares (2-D FRLS) algorithm 1s pre- 
sented using a geometrical formulation based on the mathematical concepts of vector 
space, orthogonal projection, and subspace decomposition. 

By appropriately ordering the 2-D data, the algorithm provides an exact least- 
Squares solution to the deterministic Normal equations. The method is further extended 
to the general FIR Wiener filter and to ARMA modeling. The size and shape of the 
support region for both the MA and AR coefficients of the filter can be choosen 
arbitrarly. The ARMA parameter estimation problem is also considered for the case 
when the system inputs not available. 

Computer simulations are presented to illustrate the applications of the algoritm for 


2-D parameter estimation, svstem identification and image coding. 
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I. INTRODUCTION 


Adaptive algorithms have been used successfully for many years in a wide range 
of digital signal processing applications involving non-stationary data. In these appli- 
cations it is desired to follow closely the variations of the parameters characterizing 
the process, by updating (in real time if possible) estimates of these parameters as 
soon as new data is available. Real time implementation of these algorithms only 
recently became possible with the latest capabilities of the VLSI technology and is 
partly a result of the development of computationally affordable algorithms based on 
a very elegant mathematical formulation. This formulation is known as fast recursive 
least squares (FRLS) and is based on a geometric approach. The derivation of al- 
gorithms based on the geometric approach uses the concepts of linear vector spaces, 
orthogonality, projection matrices, and their relation with least squares prediction 
[Ref. 1, 2, 3]. Motivation for the development of similar algorithms to process two- 
dimensional (2-D) data is a consequence of the very interesting results reported lately 
in the literature on adaptive filters for one-dimensional (1-D) signals in what concerns 
their reduced complexity and excellent behavior even in non-stationary environments 
[Ref. 4, 5]. The development of Fast RLS algorithms for 2-D data based on the 


geometric approach is what 1s addressed in this thesis. 


A. PROBLEM FORMULATION 


A major problem with the extension of Fast RLS algorithms to two dimensions 
is that causality is not inherent in 2-D systems. This problem was overcome by 
associating the past of a 2-D signal with the region of support of a recursive filter mask 
(usually quarter plane or non-symmetrical half plane). By appropriately ordering the 


2-D data, a two-dimensional Fast Recursive Least Squares (2-D FRLS) algorithm 


is developed using a geometrical formulation where the vector spaces and all the 
notions associated with them are defined to reflect the 2-D nature of the data. An 
exact least squares solution to the deterministic Normal equations is provided for 
all of the all-pole (AR), all-zero (MA), or pole-zero (ARMA) models. The size and 
shape of the support region for both the MA and AR coefficients of the filter can 
be chosen arbitrarily as long as the overall system is recursively computable. The 
ARMA parameter estimation problem is also addressed for the case when the system 


input is not known. 


B. THESIS OVERVIEW 


The remainder of this thesis is organized as follows. Chapter 2 provides a sum- 
mary of the most common adaptive filtering techniques. Only brief reference is made 
the Least Mean Square (LMS) algorithm due to its simplicity and slower convergence 
properties. However a 2-D version of this technique that was recently reported in the 
literature is mentioned. Most of the chapter reviews the basic ideas of the Recursive 
Least Squares (RLS) algorithm. This provides preparation for chapter 3 where a fast 
2-D version of this algorithm is developed in detail. 

Chapter 4 is dedicated to applications of the new algorithm to the 2-D problems 
of systems identification, image coding, and parameter estimation. Different models 
(AR, MA, and ARMA) are considered. 

Chapter 5 summarizes the results obtained and suggests some possible improve- 
ments for the new algorithm. Mathematical derivations related to the geometrical 
formulation that are essential to the method, but too tedious to be inserted in the 


body of the thesis, are grouped in the Appendix. 


bo 


II. ADAPTIVE FILTERS 


A. PERFORMANCE CRITERIA 


In adaptive filtering the performance criterion is usually based on the minimiza- 
tion of a cost function dependent on the filter coefficients to be determined. The most 
common performance criterion 1s the minimization of the mean square error (MSE) 
associated with the signal to be estimated [Ref. 5, 6, 7]. In particular, if we consider 


a random process y(n) and a predictive filter of the form 


M 
y(n) = Ð_ ha(k)y(n — k) (1) 
El 


where y(n) is the predicted value and A,(k) are the filter coefficients, then the pre- 


diction error is defined by 


Net 


e(n) = y(n) — (n (2) 


and the MSE becomes 
Den) (3) 


where E is the expectation operator. For stationary data, this quantity is a convex 
quadratic function of the filter coefficients h,(’) and attains its minimum at a point 
where the partial derivatives with respect to each of the filter coefficients are simul- 
taneously equal to zero. Substituting (1) and (2) in (3) and simplifying, we obtain 


the desired expression 


0ເ 
Ohn(k) 





do fala na le or k = 1... M (4) 


The dependence of the performance criterion on the filter coefficients can be in- 


terpreted in terms of a multidimensional convex surface with a unique minimum. This 


surface is called the error-performance surface. The coefficients associated with the 
minimum mean-square error are obtained by solving the set of simultaneous equations 


in (4) which in this case are known as the Normal equations. 


B. LEAST MEAN SQUARE (LMS) ALGORITHM 
1. 1-D LMS 


The Normal equations can be solved by brute force using matrix inversion 
or by computationally faster methods such as the Levinson-Durbin algorithm for 
Toeplitz matrices. However here, we are interested in a method called the steepest 
descent, which provides an iterative solution to the Normal equations [Ref. 5, 6, 7, 8]. 
We start with a initial set of filter coefficients and a corresponding point on the error 
performance surface. We then compute the gradient vector formed by the partial 
derivatives of the mean-squared error with respect to each of the filter coefficients at 


that point. Using (4) the gradient vector can be expressed as 
Vín) = -2Ele(n)y(n)] (5) 


where y(n) isa M x 1 vector that contains the data covered by the filter mask at 


time n 


y(n) = [y(n — 1), yin — 2), 0m My? (6) 


Finally we update the coefficients by changing them in a direction opposite to that 


of the gradient vector using a predefined step size pu 
bass = by + 54e(-V(n)] = hy + HEle(n)y(n) (7) 
where h,, is a Af x 1 vector that contains the filter coefficients at time n 
hạ = [As(1). hạ(2),.... ha(M)]” (8) 


The inconvenience of this approach is that it requires an exact measure of the gradient 


vector at each iteration and the gradient involves statistical expectation. Usually the 


statistics of the data are not available and must be estimated from the raw data. 
Thus to obtain a really accurate estimate of the gradient is quite cumbersome and 
computationallv expensive. 

A practical method to estimate the gradient vector in a simple manner 
directly from acquired data is used in the Least Mean Squares (LMS) algorithm of 
Widrow [Ref. 5]. For this, the expectation in (5) is ignored and an instantaneous 


estimate of the gradient vector 1s taken to be 
V(n) = —2e(n)y(n) (9) 
The update for the filter coefficients then has the form 
boi = by + 54[-V(n)] = by + Heln)u(n) (10) 


This method is quite attractive for a wide range of applications since it requires no 
matrix inversions, correlation function estimation, or (actual) gradient computation, 
and hence has low computational complexity. However its convergence is relatively 


slow. A detailed derivation and analysis of the LMS properties can be found in [Ref. 


5. 6). 
2. 2-D LMS 


The extension of the LMS algorithm to 2-D signals is straightforward. 
Reference [Ref. 9] gives a detailed derivation and shows that the analysis presented by 
other authors for the 1- D LMS is also applicable to the 2-D version of the algorithm. 

The final form of the 2-D LMS algorithm is very similar to (10) but the 
vector containing the 1-D filter coefficients h, is substituted by a matrix W; con- 
taining the 2-D filter coefficients. The instantaneous estimate of the 2-D gradient 
uses the data matrix X; formed by the 2-D input samples covered by the 2-D filter 


mask at iteration 7. For a N x Af 2-D sequence, at sample y(n,m); if 7 is the linear 


scanning index 
j=mM+n (11) 


then the algorithm takes the form: 


1 ú 

Wij¡+ 3 ໄ້ + 5ມໄ- W; + ມໃ) (12) 
A separate derivation of this algorithm was also presented in [Ref. 10], together with 
some examples of its performance through computer simulation of a noise canceler 


and an adaptive line enhancer applied to an image processing problem. 


C. 1-D RECURSIVE LEAST SQUARES (RLS) 


In the adaptive methods presented above the need to solve the Normal equa- 
tions appears as a consequence of the minimization of a statistical cost function, 
the mean-squared error. However the implementation instead uses the actual data to 
compute errors and update the coefficients. This is the main cause of the performance 
deficiencies encountered when implementing this algorithm. 

Another possible approach is to base the performance criterion upon error mea- 
sures derived from the actual data. This class of techniques is known generally as 
Least Squares (LS) algorithms. 

The LS algorithm is designed to find the set of filter coefficients that minimize 


the cumulative sum of squared errors. 


en) O) (13) 


Although this seems very similar to the previous performance criterion, it results in a 
set of deterministic Normal equations whose solution provides filter coefficients that 
are exactly optimal, according to (13), for the acquired data instead of statistically 


optimal for a class of data as in the case of the steepest descent methods [Ref. 6]. 


To formulate this problem we once again differentiate the cost function with 
respect to the coefficients. However since we here use summations instead of expec- 


tations the result is 





den) = 
= —2r,,(0, k) EZ A) =0 for k= l, 00.9 M (14) 
Oh, (k) l=1 


where we define the deterministic correlation function r,(k, 1) as 
ເ. |) tor km =0,.... Af (15) 
t=1 


This set of Af simultaneous equations are the deterministic Normal equations. The 


equations are written in matrix form as 
R(n)h, = ກຸກ) (16) 


where R(n) is the Af x Af deterministic correlation matrix with the structure 


A a E) 
25 AZ (DMI 

R= | POD MBB = BAN a 
ND ce ເເຈນ ໃໍ່) 


and r(n) is the AJ x 1 vector of deterministic cross-correlation terms between the 


desired filter response and the filter inputs. 
tín) = [rn(0,1).rz(0,2), -- -ra(0, M)]” (18) 
If R(n) is nonsingular then the solution to the Normal Equations is formally 
h, = Rº(nr(n) (19) 


This brute force solution requires on the order of M? arithmetic operations. A better 
approach is to use a method known as the recursive least squares (RLS). This uses 


the Matrix Inversion Lemma [Ref. 5] to update the inverse correlation matrix and the 


T 


cross correlation vector as new data is acquired and thus to compute h, recursively. 


The resulting expression for R”!(n) is 


R7!(n —1)y(n)y?(n)R7}(n — 1) 


"` ˆ.- E 
E OSR noD- ARA (A Dg) = 
By defining the a priori error e(n|n — 1) as 
e(n|n — 1) = y(n) - y’(n)h.-ı (21) 
and the gain vector k(n) as 
R(n — 1)y(n) 
k <... 5 >... CỔ ÔNG 2 
k(n) 1 + y*(n)R=(n — 1)y(n) 22) 
we can rewrite (20) as 
R'!(n)=R!(n-1)-k(n)y (n)R!(n — 1) (23) 


If we then use r(n) = r(n— 1) +y(n)y(n) and substituting in (19) the desired update 


for coefficient vector h, is found to be 
h, = h,-: + k(n)e(nin —1) (24) 


To update the coefhcient vector as new data is acquired all we must do is to compute 
the last four equations assuming that all the parameters with index n—1 are available 
at time n. Since the non-singularity of the deterministic correlation matrix is a 


requirement for the solution of the problem, we must start with the initial condition 
RO) =" laggy (25) 


where cis a small positive constant. It also customary to initialize all the components 
of the coefficient filter h, to zero. 
The RLS algorithm is computationally more expensive than the LMS. The RLS 


algorithm requires a total of 3M(3 + M)/2 multiplications/divisions per iteration, 


while the LMS algorithm requires only 241 + 1. On the other hand the convergence 
performance of the RLS algorithm is much superior [Ref. 5, 6]. A detailed derivation 
of this algorithm and its overall performance can be found in many references such 
as Ret. 5, 6, 8). 

An enormous reduction of the computational complexitv of the RLS method 
1s obtained by using a geometrical formulation in its derivation. The computational 
complexity of the algorithm is reduced to approximately 6M arithmetic operations. 
The geometrical approach also provides an interesting interpretation of the prediction 
problem in terms of the concepts of vector spaces and orthogonality. Since we will 
use an expanded version the geometrical formulation to derive the extension of this 
method to 2-D signals, and the derivation is lengthy, we will not derive the 1-D method 
here. However a very comprehensive explanation of the geometrical approach for 1-D 
signals can be found in [Ref. 6]. 

For the case of nonstationary data it is frequently advantageous to incorporate 


a forgetting factor in the cost function 
ວ. 5, oe (26) 
ı=] 


The interpretation of thıs forgetting factor can be understood as an exponential win- 
dowing of the data in a fashion such that. the most recent data has a heavier influence 
in the cost function to be minimized. The fast algorithm is mathematically equivalent 
to the RLS, hence its stability is guaranteed in theory for any possible forgetting factor 
A [Ref. 7]. However the efficiency of this class of algorithms is a result of the reduced 
number of variables used to represent quantities such as the inverse deterministic 
correlation matrix. Due to the finite precision arithmetic used, the representation 1s 
only approximate. As a result the accumulation of round-off errors can set off insta- 


bility of the algorithm.The sensitivity of some quantities to round-off error are highly 


dependent on the forgetting factor used. This imposes a lower bound on À. Typical 


values for A are ın the range: 


0.95 < À < 1.0 (27) 
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Ill. 2-D FAST RECURSIVE LEAST SQUARES 


In this chapter we present the derivation of the 2-D FRLS algorithm. As men- 
tioned before we use the geometrical formulation to obtain a fast, computationally 
efficient algorithm. 

We start by introducing a convenient notation that closely follows the one used 
by Alexander [Ref. 6] for the geometric derivation of the 1-D FRLS. This is followed 
by a brief set of vector space considerations that are the basis of the problem solution. 
Next some auxiliary filters that use the same data as the 2-D filter are introduced. 
The key to this method is to find a relation between the parameters of these filters 


that permits the recursive update of all of them as soon as new data is available. 


A. 2-D PREDICTION FILTERS 


The method to be described applies to a general 2-D prediction filter of the form 
y(n;.n2) = S > ajyí(n O) (28) 
i J 


with (2,7) defined in a region that allows the 2-D AR model related to the prediction 
error process to be recursively computable (ex: quadrant or non-symmetric half plane 
support). The recursive computability is a requirement for applications where inverse 
filtering 1s used to recover the 2-D data sequence from the estimated error sequence 
as in most of the image coding schemes. To be specific and develop clear notation we 
will assume a first quadrant (NV + 1) x (M + 1) filter of the form 
N M 
y(n],n2) = 3 du dis Or) (7,7) # (0,0) (29) 
i=0 j= 


and a 2-D data sequence with A x L points as shown in Figure 1. 
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Figure 1. First Quadrant (N+1)(M+1) Filter 


The prediction error e(n,,n2) = y(n;,n2) — y(n,,n2) can be expressed using 


vector notation as 
e(ny,n2) = y(n,,n2) — y (m;no)a (30) 


where 


y(ni,no) = (y(n u ແ lý ‚y{nı ວສ N,n2),y(n1,n2 = DR aa 
ya FN AUS 
---,y(n,,n2 — M),---,y(n, — N,n3— M)]? (31) 
isa (N + 1)(41 + 1) — 1 dimensional vector formed by the elements covered by the 
filter mask, ordered along rows, and 


a= [đno, - - -,0N0, 001,4 013º", 00M, 0º H (32) 


is the vector of 2-D filter coefficients with the same ordering. We assume for now that 


ylk + 0494.<422# O 


When performing linear prediction along one of the possible directions of recur- 
sion (e.g., along rows), and all of the data necessary for each prediction is available, 
the optimal filter (least squares criterion) up to point (n,.n2), will be defined as the 
one having a set of coefficients a;;(n1,n2),(O<72< N ,0<37 <M, (2,7) € (0,0) ) 


that minimizes the sum of the squared errors along that recursion direction. 


ni K-1n2-1 
(my, n2) = 2 Je(i;na)] + 2 à le(i,5))º (33) 


B. 2-D DATA ORDERING 


One question that arises whenever we deal with finite extent sequences is what 
to do when we approach the boundaries of the 2- D data sequence and the prediction 
mask needs to cover points that hie outside of the region where the data is defined. 
One approach is to set the points outside of this region (i.e., the boundary conditions) 
to zero. The inconvenience of this approach is that the boundary conditions depend 
not only on the extent of the 2-D sequence but also on the shape of the filter mask 
and this can lead to additional complications [Ref. 11]. In addition, when we reach 
the end of a row and we start a new one, the data under the mask is almost all reset 
to zero. This causes a strong discontinuity in the process. 

An alternative approach is to assume that, although the 2-D data we process 
may not be stationary, the statistical properties of the data do not vary too rapidly, 
and so to use the data at the end of one row as the initial condition for prediction 
along the next row as is shown in Figure 2. This appears to be at least as reasonable as 
the first approach. It will be seen later that this approach also has several advantages 
in deriving an algorithm based on the geometrical approach. From a practical point 
of view, it is as if we fold the 2-D data plane and form a cylinder with perimeter 
equal to A, but the data rows instead of folding into themselves, are misaligned by 
one row. This allows the prediction to be performed along rows with the 2-D mask 


moving from bottom to top in a helical fashion. 
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Figure 2. Data Ordering 


C. PREVIOUS DATA/ PAST OBSERVATIONS 


Performing linear prediction on a 2-D signal along rows implies that the new 
data comes only from a strip with the width of the filter mask (M 41). This suggests 


an analogy with the (Af + 1) channel 1-D prediction problem. 


1. (M+1) Channel Analogy 


The (Af + 1) rows of the data strip can be viewed as (M + 1) channels of 


a 1-D signal. To support this idea, define a linear index n such that 


n=n:X K +n (34) 
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Then the 2-D data sequence y(n,,nz2) can be expressed ın terms of the data in the 


first channel as 
ylmı,nz) = yıln) (35) 


We can also express the data in the other channels in terms of y,(n) as 
yi(n) = y (n — (2 — 1)K) font M bal (36) 


We will predict along the first channel using data from all channels defined by the 
2-D filter mask. A consequence of this approach is that the lengtl of data used from 
each channel depends on the shape of the 2-D mask. Figure 3 shows the particular 
case of a Quarter Plane mask. In order to predict y;(n) = y(n,,1n2), the data used 
is formed by N samples of channel 1 (y) and (N+1) samples of channels 2 (y2) to 
M+4+1 (yarai). This requires a total of (N +1) x (M + 1) — 1 coefficients as in the 
2-D mask. 


Channel 1 
Channel O 


Channel 2 


271 Channel M+] 





Figure 3. M+1 Channel Analogy 
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2. Past data 


The multichannel analogy will be used to define what we call the past data 
and the observation data because the notion of multichannel prediction will be useful 
later. From now on we will drop the 2-D notation and use the index n associated 
with the recursion along the rows. 


Begin by defining the (n + 1)-dimensional vector 
y.(n) = [y:(0), y, (1), 2. ‚yiln))? | < ? = M aa (37) 


that contains data in channel 7 up to the point n. Further define the delay operator 


27* such that 


“y (n) = [0.0,-,y:(0), y:(1), ule — ENS (38) 


is a (n + 1)-dimensional vector that contains the data of y (n) delayed by k samples 
and pre-windowed. We call y (n) the observation data since it contains the data 
sequence we desire to predict. The past data with respect to y(n) 1s formed by all 
of the points y,;(7) such that (2 <i<M+F1,0<3£n) or (; =1,0 FEA 
as shown in Figure 4. Note that different channels have data in common. With the 


new notation defined we are ready to reformulate the problem. 


D. PROBLEM REFORMULATION 


We start by redefining (31), the vector that contains the data covered by the 


2-D filter mask, as 


vay = DHL E 1), ` ‚Yıln — N),y2(n), o -,yM+ı(R), E sA+1(n Es Ny” (39) 


where the subscript (1,N) denotes the fact that the data in the first channel appears 


delayed by 1 to N samples. We want to find a prediction filter of the form 


th(n) = y; y(n)a(n) (40) 






Figure 4. Past Data 





with 


a(n) = [đio(n). u ‚ano(nr), Ts a ani(n), K dou Ln), T anmí(n)]? 


that minimizes the sum of squared errors 


an) = la)? 
1=0 
where the prediction error given by 


A 


en) = y(n) — ya(n) 
This can be written in vector notation as 
a(n) = ei (n)eı(n) 
where 
e(r) = y,(n)-¥,(n) 
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* 
Y 


(41) 


(42) 


(44) 


(45) 


and 


(n) (46) 


I 
pm, 
3 
SS 
| 
= 
~ 
ຽ 
| 


where Yı a(n) is a data matrix formed by the data covered by the 2-D mask from 


the origin up to time n. The matrix Yı n(n) can be written as 


Bar) = : (47) 


T 
Xin 
or using a different partition. Y, y(n) can alternatively be written in terms of the 


data in the M +1 channels and their delayed versions (37) and (38) as 


A. 


Yin(n) = [Ty (nm). 0,2 l yay (nhay) a ral) (48) 


As will be shown later, a necessary initial condition for the geometric formulation 
to work is that we start at a sample y,(0) such that S =0. Thả in. 
initial conditions for the data under the 2-D mask must be zero. Now let us proceed 
with the minimization of (42). The least squares solution for a(n) is given by the 


pseudo-inverse: 


a(n) = (Yƒx(n)Yax()) ¥7y(n)y,(n) (49) 


where (Y?(®)Y¡x(n)) can be interpreted as the inverse of a 2-D deterministic 
correlation matrix and Yin (n)y,(n) can be interpreted as the vector of the deter- 
ministic cross-correlations between the observation data and the past data. A new 
expression for e,(n) is obtained substituting the solution for a(n) in (46) and using 


the result in (45). This yields 


ein) = y, (n)- P,nm(n)y,(n) 


(1= P,n(n)) y, (n) (50) 
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where 
-1 
a Y la) (51) 


is interpreted as the projection matrix that projects vectors into the subspace spanned 
ວ 
by the columns of Y, n(n). (Note here that it is assumed that o) 


exists.) Also define 
Pin(n) = I-Pıx{n) (52) 


as the orthogonal projection matrix associated with the same subspace. 
Both the L.S. estimate of y (n) and the prediction error can now be expressed 
in terms of the projections matrices. The estimate ý (n) is the projection of y (n) 


onto the subspace spanned by the previous data 

ý (n) = Pin(n)y (n) (53) 
The error e, (n) is orthogonal to the estimate y, (n). 

en) = Pix(n)y,(n) (54) 

Next, we define the operator K; n(n) as 
Kix(n) = (Yñy(m)Yix(n)) ໃ້, (ນ) (55) 

hence 

a(n) = Kin(n)y,(n) (56) 


K; a(n) can be interpreted as the operator that computes the best LS filter a(n) for 
predicting y,(n) given the data set Yi, n(n). Now since y (n) can be obtained from 
y,(n— 1) as soon as y(n) is available, if we find an efficient way to get Kı x(n) from 


K; x(n — 1) then we will be able to update a(n — 1) to a(n). 
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E. VECTOR SPACE CONSIDERATIONS 


Before going any further we must present some concepts and relations associated 
with vector spaces that will be needed later on. To make the results generic, in this 
section our data matrix is called U(n). This generic data matrix can represent the 
matrix Yı x(n) defined in (47) and (48) and other similar matrices that are defined 
later in this chapter. Then following the definition of (51) and (52), the projection 


matrix associated with U(n) is 


Py(n) = U(n) (U"(n)U(n)) U7(n) (57) 
and the orthogonal projection matrix is 
Po(n) = I-Py(n) (58) 


The columns of U(n) are formed by the vectors that span the vector space associated 
with Py(n); hence when we compute Py(n)U(n), we have 
Py(n)U(n) = U(n)(UT(n)U(n)) UT (n)U(n) = U(n) (59) 
ep m mn nn 


I 


It follows from this that the projection matrix is idempotent 


PunPun) O (U7 (n)U(n)) | Uf(n)= Pu(n) (60) 
Ủ(n) 


The following relations also follow from the definition of Py (n) and Pù(n). 


PG(n) = Pu(n) (61) 
Pb(n)Pb(n) = Pb(n) (62) 
Pt (n) = Pủ(n) (63) 
Ph(n)Pu(n) = Dialisis) (64) 


All of these relations are easy to prove. 
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Now let us append a matrix V (1.e., more columns) to the matrix U(n). The 
columns of V do not need to be orthogonal to the subspace spanned by U(n). hence 
the subspace spanned by [U(n), V] is the same as the one spanned by [U(n), W], 


with 
W = Py(n)V (65) 
Then we define 


Puv() =  Pu(n)+ Pw(n) (66) 


= Py(n) + Pö(n)V (IPb(n)V]TPb(n)V)— VTPð(n) 
If we note that Puy(n) = 1— Puv(n); then it follows that 
Póv(n) = Pũ(›)~— Pö(n)V ([Pb(n)VỊTPb(n)V)” VTPb(n) (67) 


These results also are valid when V is a vector (1.e., a matrix with a single column). 
Some other useful relations with generalized vectors or matrices y and z which 


follow immediately from (66) and (67), are 


Puv(n)y = Pu(n)ìy+P(n)V(ÍPb(n)VJTPb(n)V) V7PL(m)y (68) 
Pùv(nìy = Pb(a)y— Pb(n)V(ÍPö(n)VƑPb(x)V) VfPb(n)y (69) 


z'Puv(n")y = 2’ Py(n)y (70) 


+ zTPb(n)V (Pb(n)VITPb(n)V)_ VfPb(n)y 


z'Puv(®)y = z’Pü(n)y (71) 


z"Pb(n)V ([Pb(n)VITPÿ(n)V) V*Pb(n)y 


These relations with the appropriate choices of U(n), V, y,z will be the basis of 


several recursions needed later. 
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1. Projection Matrix Time Update 


Let us partition the data matrix U(n) as 


U(n-1 
Ulm x . (12) 
u 





where u? is the last row of U(n). We can see from (47) that this is a valid repre- 
sentation when U(n) is taken to be Y, (ກ). In this case u? corresponds to Yara 
It is seen that U(n) is formed by the columns of U(n — 1) with one more dimension 
appended, the components of u?. Now define a (n+1)-dimensional vector z(n) called 


the unit time vector [Ref. 6] 
z(n) = [0,0,0,---,0,0,0,1]* (73) 


and append it to the data matrix U(n) to form [U(n), z(n)]. It will now be shown that 
the subspace spanned by this new matrix contains not only ý (n) but also y,(n — 1). 

To see this, proceed as follows. We know that if ý (n) lies in the subspace 
spanned by U(n). then appending z(n) will not change a thing. Using (68) with 
V = a(n) and y = y,(n) we obtain 


Pu-(n)y (n) = Puln)y (n) (74) 
y, (n) 
+ Pb(r)z(n) ((Pó(n)z(n) Pb(n)a(n))  27(n) PH(n)y, (n) 
0 


To see that y (n — 1) also lies in this subspace, note that since x(n) has only its 
last component non-zero, a linear combination of the vectors in [U(n), z(n)] can be 
used to obtain a matrix whose columns span the same subspace as the columns of 
U(n — 1). 


It can be shown that Py,(n) has the particular form 


Pyín-—1) 0 
Py, (n) = ຕ | E (75) 
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(a detailed derivation of this result is presented in the Appendix). Further recall that 


y,(n) is a vector formed by all the data from the origin up to point n. Thus (75) 


provides a way to decompose a projection matrix into past and present components. 


Pu(n-1) 0 || y (n-1) 


Py, (n)y,(n) = (76) 
0“ 0 y(n) 
2 O 0 de) P ya) 
0” 1 yi(n) y(n) 


2. Angle Parameter 


The vector z(n) also provides a way to quantify the change of subspaces 
when we update U(n — 1) to U(n). First note that the inner product of two vectors 
gives the cosine of the angle between them multiplied by the product of their lengths, 
and also that the length of a vector resulting from the projection of any vector into 
a subspace is given by the length of the original vector multiplied by the cosine of 
the angle between the vector and the subspace (this angle has always magnitude 
< 5). Now observe that r(n) is a unit vector orthogonal to the subspace spanned by 
U(n — 1), and Py,(n)z(n) is a vector orthogonal to the subspace spanned by U(n) 
with length equal to the cosine of the angle between x(n) and this subspace. Then 


defining the inner-product as 


we obtain 
iG) = (x(n), Py(n)z(n)) າ...“ = os O (78) 


where @ is the angle between the components of z(n) that are perpendicular to both 
subspaces {U(n)} and (U(n—1)). The variable y(n) will be used to update K, x ({n). 
In order to begin the derivation of the recursive procedure we now introduce 


three auxiliary filters: 
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— Forward Multichannel] Prediction Filter 
— Backward Multichannel Prediction Filter 
— Gain Transversal Filter 


These are discussed next. 


F. AUXILIARY FILTERS 


The reason for the auxiliary filters will become clear later when it is shown that 


the update of a(n — 1) can be expressed in terms of the auxiliary filters parameters. 


1. (M+41) Channel Forward Prediction Filter 
We begin by defining a (M+1)-channel signal xp(n) formed by the data 
acquired by the 2-D mask when it is moved from n to n +1: 
xp (1) > [y(n) yola +1). NM on + 1) (79) 


We want to find the best LS filter that predicts xr(n) based on the data y, y(n) 
covered by the 2-D mask (see Figure 5). Let the coefficients of this filter be defined 


by a (N + 1)(A1+1)- 1) x (M +1) matrix of the form 


F(n) = En ).fạ(n). Me Du on) (80) 


where each of the ((N + 1)(M +1) — 1)-dimensional vectors f;(n) for (1 <7 < M +1) 
is comprised of the multichannel prediction coefficients for channel 7 with the same 


support as a(n). The prediction of x(n) is given by 


xp(n) = FT(n)y, y(n) (81) 
and the prediction error is 
er(n) = xp(n) — Xr(n) (82) 
If we define 
Xr(n) = ly (m).y,(n+1),+ Y 1, (1 +1) (83) 


= [xr(0).xr(1),:--,xe(n)]” 
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Figure 5. Forward Filter Mask 





then Xp(n) is a (rn + 1) x (Af +1) matrix that contains all the multichannel data 


from the origin up to current value of the index n. The estimate of Xr(n) is thus 


\ 


Rela) = Yin(n)F(n) 

and the prediction error, also a (n + 1) x (Af +1) matrix, is 
Er(n) = Xp(n) — Xr(n) 

The error covariance for the multichannel Forward Filter is: 

Đr(n) = Ep(n)Er(n) 
Since we desire F(n) to minimize the error energy 

ir [Zp(n)] = tr [EZ(n)Er(n)| 
the optimal LS filter is again obtained using the pseudo-inverse of Yi n(n) 
Fin) = (Yo) Y ina) ` YTn(n)Xr(n) 
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(84) 


(85) 


(88) 


This can be expressed using the operator defined in (55) as 
F(n) = Kin(n)Xr(n) (89) 


The orthogonality principles mentioned before also apply. That is, the estimate Xr(n) 
is the projection of the columns of X-(n) onto the subspace spanned by the columns 
of the matrix containing the previous multichannel data (which in this case is the 


same as the 2-D data). 
Xr(n) — Pın(n)Xr(n) (90) 


The error Er(n) is orthogonal to the estimate Er(n), i.e., the columns of these 


matrices span subspaces that are orthogonal to each other. 
Er(n)=Py.x(n)Xr(n) (91) 
If ef (n) is defined as the last row of Er(n), it can be obtained using z(n) as 
er(n) = zT(n)Er(n) = z(n) Piy(n)Xr(n) (92) 
2. (M+1) Channel Backward Prediction Filter 


For the backward prediction problem we define a (M+1)-channel signal 
x p(n) formed by the data left out by the 2-D mask when it is moved from time n to 


n + 1. This is given by (see Figure 6) 
xpln) - mir AT NS MỊ (93) 


Now define a (n+1)x((N+1)(41+1)—1) data matrix Yo y-1(n) that has a structure 
similar to Yı n(n) (47,48) 


Yo NI = . (94) 





or 
ສ + tua a rs Y ap 601 (05) 

We note that 

AS ເບ ປ) (96) 
and 

Wail =e, Xo wen) (97) 
The problem is now to find the best LS filter that predicts xp(n) based on the data 
matrix Yo.v-1(7). Let the coefficients of this filter be defined by the ((N + 1)(Af + 
1) — 1) x (Af 4+. 1) matrix 


B(n) == bu), b,(n), o ‚buzi(n)] (98) 


where each of the ((N + 1)(41 +1) — 1)-dimensional vectors b;(n) for (1 <1< M+1) 


is comprised of the backward prediction coefficients for channel +. The support of 
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this filter is the same as the support of a(n) but shifted one sample to the right (see 
Figure Gyr 


The prediction of xg(n) is given by 
xp(n) = B(n)y, y ..(n) (99) 
and the prediction error is 
eg(n) = xg(n) — Xp(n) (100) 


We continue to proceed as we did for the forward multichannel filter. The (n +1) x 
(M + 1) matrix that contains the backward multichannel data from the origin up to 


n is defined as 
Xaa) > | (m).2My,(m), 2 MY yy, (m)| 
= [xp(0),xa(1),+++.xa(n)]" (101) 
Then the estimate of Xg(n) is 
X p(n) = Yon-1(n)B(n) (102) 
and the prediction error (also a (n +1) x (M + 1) matrix) is 
Ep(n) = Xp(n) — Xa(n) (103) 
The error covariance matrix for the multichannel backward filter is then given by 
Eg(n) = Esg(n)Eg(n) = Xp(n) Poy-¡(n)Xg(n) (104) 
To minimize the error energy 
tr [Ee(n)] = tr [Es(n)TEs(n)] (105) 


our optimal LS filter is, once more, obtained using a pseudo-inverse, but this time for 
the data matrix Yon-(n). 


-] 


B() = (Yến ¡(n)Yow-i(n)) Yon-ıln)Xa(n) (106) 


28 


Finally defining a new operator Ko, -1(n) in terms of the new data matrix Yo n-1(n) 
Kox-ıln) = (Yix-ın)Yox-ıln)) Yar-ıla) (107) 

we can rewrite (106) as 
B(n) = Kon-1(n)X g(n) (108) 


The orthogonality principles apply once more. The estimate Xs(n) is 
the projection of the columns of Xg(n) onto the subspace spanned by the previous 


backward multichannel data 
Xa(n) = Pow-1(n)Xp(n) (109) 


where Pon-1(n) 15 the projection matrix associated with the vector space spanned 


by the columns of the new data set Yo y-1(n). 
= 
“າ NE ern) Nero) (110) 


The error Ep(n) is orthogonal to the estimate Xp(n), 1.e., the columns of these two 


matrices span subspaces that are orthogonal to each other. 
Ea(n) = Po.y_i(n)Xa(n) (111) 
Defining eZ(n) to be the last row of Eg(n) we have 
es(n) = x(n) Es(n) = z(n) Pon- (n)XB(n) (112) 


3. Gain Transversal Filter 


The gain transversal filter does not relate to specific prediction operations 
for the data but rather provides another way of quantifying the angular change y(n) 
between the subspaces associated with data matrices at times n and n— 1. To begin, 


consider the projection of the vector x(n) onto the subspace spanned by Yo,n-1(n). 
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Since this projection Po,n-i(n)x(n) is contained in the subspace of Yon-ı(n), we 


can express it as a linear combination of the columns of Yon-ı(n). We write this as 
Pon-1(n)z(n) = Yon-(n)g(n) (113) 
where g(n) is a ((N+1)(41+1)—1)-dimensional vector of weights. Note that (113) can 
be interpreted as the LS prediction of z(n) based on the data matrix Y y y-¡(n) where 
g(n) is the ((N + 1)(41 + 1) — 1)-dimensional vector of filter prediction coefficients. 
The estimate of z(n) is thus given by 
(ນ) = Poy-1(n)z(n) = Yo,n-1(n)g(n) (114) 
and the prediction error 1s 
e-(n) = zin) - Pox-ı(n)z(n) = Pin. (n)z(n) (115) 
The last component of e.(n) can be obtained using z(n) and turns out to be equal 
to y(n) as in (78) 
e-(n) = min) Po y_y(n) ata) = yn) = cos 0 (116) 
Then substituting the middle part of (115) in (116) we obtain 
y(n) = (x(n), x(n) — Pow-1(n)z(n)) = 1 — (z(n), Pon -1(n)z(n)) (ປປ 


and using (113) in (117) we find 


2(1) > 1 -- (zn). Yox-ı(n)g(n)) = li Yo nar E) OO (118) 
where Vo en is the last row of the data matrix Yo,n-1(n). If we now recognize 


that cos? 6 = 1 — sin’ 8 we see that 


T E. 
Yo. n— (7) g(7) = sie (119) 
If we use the LS criteria to get g(n), the solution is again, given in terms of a pseudo- 


inverse, or more conveniently in terms of the operator Ko y-1(n) of (107) 


g(n) = Kon-(n)z(n) (120) 


This is in the same form that we have for the other filters. 
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G. FILTER UPDATE PROCEDURES 


In this section we determine the time update for the four filters defined so far. 
We begin by showing how to update K; a(n) and Koy-1(n). This result will be used 
for updating each of the four filters. 


1. Transversal Operator Update 


Since the operations to be described now are common to all four filters, 

we develop the formulas in this subsection in terms of the generic matrices U and V 
introduced in section E of this chapter. Beginning with (66) we have 

Puv(n) = Py(n)+ Pg(n)V ([P&(n)V]’P&(n)vV) V?P5(n) (121) 


where U is a (n+1)x((N+1)\(M +1)— 1) matrix and V isa (n+1)x(M +1) 


matrix. By definition 


Puv(») = [U. V] ((U, VJ"(U, V))(U, VJ” (122) 
and 
Kyv(n) = ([U, ປບ, ທ) [ປກ (123) 
It follows that 
Kuv(n) = Kuv(n)Puv(n) (124) 


hence from (123) 


Kuv(n)[U, V] T+ (MAD AMP ATUM ADAM] (125) 


Ivin: On xM' | 


Omxn ImxM 


with N = ((N+1)(M41 + 1)—1) and M' 





(M +1). Now we can partition (125) and 


write it as two equations, namely 


Kuv(n)D 





I / 1 
N'xN (126) 
Or x N) 
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and 


Kuv(n)V = (127) 





ON» AM 
Lr: x M 


The first equation can be used in conjunction with Py = U(n)Kvy(n) to obtain 


Lx NT |Ken a Ku(n) | 


Omxn' UM: «(n+ 


Now substituting (121) in (124) and using (125)-(128) to simplify the resulting ex- 


| 


x (Pö(n)VITPb(n)V) V’Pt(n) 


Kuv(n)Pu(n) = (128) 








pression we have 


Ku(n) 


O Mx(n+1) 


Ku(n) 


O M'x(n+1) 


(0 .,,., 
Kee = N'xM 




















vị (129) 


Tap yar 


By forming [V, U] and following similar procedures, it is straightforward to show that 


ວ” 
4 | M'x(n+1) | vị (130) 


Ky(n) 
x (Pb(n)VPb(n)V) VTPb(n) 


Onmx(n41) ]1/)/! 


Ky(n) 


Kyu(n) = x 
N'xM! 

















For the particular case when V = z(n) these relations are also valid but we can obtain 


them from the derivation given in the Appendix. The result 1s 





Ky(n - 1 0 
Ky,(n) = ` (131) 
—u“Ky(n = 1) ] 
To check this, note that for V = z(n) (124) can be written as 
EU A ee (132) 


Then equating these two ways of obtaining Kuy ,(n) we can update Ky(n — 1) to 


Ku(n) 














Ku(n-1) 0| | Ku(n) | | Ku(n)zín) = 
=u Kyn D) 1 0 = 
x (IPb(n)z(n)fPb(n)z(n)) z7(n)Pb(n) 





2. 2-D Filter Update 


We now develop update formulas for the 2-D filter. From (56) we have 


a(n) = Kin(n)y,(n) (134) 


To find a(n) in terms of a(n —1) we use (134). Post-multiplying by y (n), and taking 
V = z(n) and U = Yı n(n) yields after simplification 


Kin(n — 1) a = ES (135) 
= (a — 1) 1 0 





yi(n) 








At this point we note from (97) that since 


Non) 








(136) 
with y, (0) = O as defined initially. then 
07 
Yi) = (137) 
Yon-1(n = 1) 
From this it follows, using the respective definitions, that 
Ki (7) = [0,Ko,n-ı (n zu 1)) (138) 
and 
01 
P). 0= u (139) 
0 Pon-ı(n -1) 
These results in conjunction with (118) and (120) allow us to write 
g(n— 1) = Kon-1(n — 1)z(n — 1) = Ki n(n)z(n) (140) 
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and 
ya -1) = 1- (ein 1), Yor-ıln -Vgir-1))=1-yl,, 
= 1- (z(n), Yın(n)g(n — 1)) =1— y7 v(n)g(n — 1) 
and also 
y(n — 1) = zT(n)P‡x(n)z(n) 


The upper part of (135) then becomes 


4) = ath) TI 
a(n 1) = a(n) gl De 

E e(n) 

a(n) = a(n — 1) + g(n ETE Ml 1) 


To get €,(n) before updating a(n — 1) we substitute (144) in 


eı(n) = y (n) — y, ylnjal(n) 


to obtain 


ej(n) 


e(n) = ej(nin — 1) — ne ee 
(n) = e(nin O =1) 


where we define 


e(n|n — 1) = y(n) - y7 (n)a(n —1) 


Now (146) can be simplified using (141) 


e(n) 


ຂະ ຂຄວ ເອດ il) ສ” 


and finally (144) can be written as 


a(n) = a(n — 1) + g(n — 1)Jer(nin — 1) 


(n — 1) (0 
(141) 


(145) 


(146) 


= eı(n|n = 1)+(n = 1) (148) 


(149) 


At this moment we have all we need to update a(n — 1) to a(n) assuming that all 


variables with index n — 1 are available. However we want to find 


g(n) and y(n) in 


order to proceed to update a(n) to a(n +1) as soon as Yj y(n +1) is available. This 


is discussed next. 
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3. Gain Filter Update 


Begin by forming a new data matrix Yo y(n) that can be expressed in terms 
of Y, n(n)or Yo y-¡(n) and the other matrices Xp(n) or Xg(n) after a multiplication 
by suitable permutation matrices. Each row and column of the permutation matrix 
contains a single 1 with all the other entries equal to zero. The positions of the 1’s are 
chosen so that when this matrix premultiples one of the data matrices it rearranges 


its columns to conform with the desired order. We write this as 
Yonín) = (Xp(n), Yi n(n) Vr = [Yon-1(n), Xp(n)) Vs (150) 


Now form Ko,n(n) and post-multiply the result by r(n) using the first part of (150) 
and (130) with V = Xp(n) and U = Y, n(n) to obtain 


Da Doxa 
4 Ex n 
Kon(n)z(n) = WE x(n) + Up ແ... (151) 
Kin(n) E rn, 
-F(n) 


= 
NS າບ ເນ ແ, aln) 
%==—————.~———~ seo. SS 
1. (ກ ) er(n) 


Since the permutation matrices are orthogonal (W7 = W-1) we have 


Ont'x(n41) Inxar 
el |) Se 


VrKon(n)r(n) T 
Kin(n) —F(n) 


Sr (n)er(n) (1 


ot 
t9 
ço” 


By analogy with the definition of gain filter (120) we can define 


g'(n) = VrKon(n)z(n) (153) 


OT 


Weg (n) = Kon(n)z(n) (154) 


as the (N+I)(M +1)-1]+(M + 1)-order LS predictor of z(n) using the permuted 


data matrix Yo,n(n)YZ. Equation (152) can be rewritten using (140) as 


Om 
g (n) = 








I 1! 1! 
+ | nia | 5E'(n)er(n) (155) 
mã -F(n) 

An alternative way to find Ko y(n)z(n) is to use the second part of (150) 


in (129) with V = Xg(n) and U = Yoy-1(n) to obtain 


ie | al (156) 
OM'x(n+1) 


Kon(n)x(n) = we 





— Ko n-1(n)X p(n) 


+ VỆ ອິ (n)eg(n) 








Tyg’ at 
Now similarly define 


g(n) 
Om 


—B(n) 


g (n) = WpKo wy(n)z(n) = Ya (njeg(n) (157) 














LM: xM 
as the [((N +1)(Af +1)—1}+ (Af 4+1)-order LS predictor of z(n) using the permuted 
data matrix Woe cs 

Weg" (n) = Veg (n) (158) 
it follows that 


g'(n) = Vo Vrg'(n) (159) 


It is useful to partition g”(n) as 


M(n 
cu: | l) (160) 


where M(n) isa (N +1)(41 +1) — 1 vector and m(n) is a (Af +1) vector. The lower 


=F 


partition of (157) is then 
m(n) = Đg`(n)ep(n) (161) 
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This can be substituted in the upper partition of (157) to give 
g(n) = M(n) + B(n)m(n) (162) 


Thus it is seen that if g'(n) is available, we can obtain g(n), provided that we have 
F(n), B(n), and all the associated parameters already available. 
The inversions of XZp(n) and Xg(n) can also be carried out recursively using 


the matrix inversion lemma (details are given in subsection 7 of this chapter). 


4. Forward Filter Update 


To update the forward filter, proceed as follows. From (89) we have 
F(n) = K,n(n)Xr(n) (163) 


To find F(n) in terms of F(n — 1) we use (134) post-multiplied by X-(n) with V = 
m(n) and U(n) = Yı,x({n) to obtain 

















Ki xv(n-1 0 X — 1 
AU ) p(n ) (164) 
Y Kin (n —- 1) 1 xL(n) 
K,n(n) Kı,n(n)z{r) | x! (n)Piv(n)Xp(n) 
= Xr(n) — S a EERE 
u zT(n)Piy(nyEn) 
Then using only the upper partition we find 
er(n) 
Ria=1)= Ein) - en = ——. 165 
(n= 1) = Fn) = gin = 1) FE (165) 
or 
i 
F(n) = F(n— 1) + g(n — e (166) 
= y(n — 1) 
To compute er(n) before having F(n — 1) we note from (81) and (82) that 
ep(n) =xp(n) — F7(n)y, „ím) (167) 
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and substitute (166) to obtain 


(ກ) 


T = T ". T ຂະ = 
ep(n) = er(n|n — 1) Yin Ign e E (168) 
where we define 
er(n|n — 1) = xz(n) — F“(n — ອກ) (169) 
Now (168) can be simplified to 
y T er(n) T 5 
epa Se A 0) aay cee) mE er(njs— 1)ynz—1J7,... 
Hence from (166) we have 
E(n) = Ein 2 — Def(nl|n — 1) (171) 


This is the desired update formula for F(n). 
To compute p(n) we use (71) with U = Y;ní(n);V = AMS 


y = Xp(n) and. also (75) to obtain 








T 
Xe Pin X = Ep(n) eee) (172) 
Rs y(n — 1) 
Then using (170) we obtain 
Pra(n—1) 0 
X7(n) G Xr(n) = Ep(n) - er(n)ek(n|n - 1) (ດ 
0” U 
Finally, simplifving (173) we find 
Sp(n-1)= Đr(n)T— ep(nje£(nin — 1) (174) 
or 
Sp(n) = Esr(n — 1) + er(nje£(nin — 1) (175) 


Thus all the parameters for the forward multichannel prediction problem can be 


updated from the values obtained at the end of the (n — 1) recursion. 
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5. Backward Filter Update 


The update procedure for the backward filter is similar to that for the 


forward filter. From (108) we have 
B(n) = Kon-1(n)X g(n) (176) 


To find B(n) in terms of B(n — 1) we use (134) with V = z(n) and U= Yon-1(n) 


and postmultiply by Xg(n) to obtain 


























Kon-1(n — 1) 0 97. 7 (GEO 
(7 
a a ln —1) 1 xL(n) 
u Kon-(n) = Kon-1(n)z(n) x! (n)Pona(N)Xp(n) 
0 -1 #T(n)Pqw-¡(®)z(m) 
Then using only the upper partition of this equation we obtain 
eL(n) | 
B(n — 1) = B(n) — g(n)-É 178 
E a (178) 
or 
eb(n) 
B(n) = B(n —1) + g(n)— 179 
A A (179) 
To compute eg(n) before having B(n — 1) we substitute (179) in 
en(n) = xa(n) — BT (n)y, y_4(m) (180) 
to find 
eb(n 
ar e mn E Yay_¡(n)g(n) p(n) (181) 
| y(n) 
where we define 
eg(n|n — 1) = Xa(n) — B*(n — 1)y, y_,(n) (182) 
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Now (181) can be simplified to 





eB(n) = eB(n|n — 1) + (+(m) — 1) E = ef (n|n — 1)>(n) (183) 
Therefore 
B(n) = B(n — 1) + g(n)ep(n|n — 1) (184) 


We note here that the update of g(n) depends on B(n) and vice-versa, but 
we will address that problem shortly. 
To compute Lg(n) we use (71) with U = Yow-i(n) , V = a(n) and 
Z = y = X p(n), and also (75) to obtain 
ep(n)eb(n 
ໃ.“ elit) = »B(n) — en (185) 


Then using (183) 








Po. (nm SO 
Ein) dl | Xp(n) = Eg(n) — es(n)eg(n|n — 1) (186) 
a 0 
and simplifying we find 
Sp(n — 1) = Es(n) — ep(n)ep(n|n — 1) (187) 
or 
Se(n) = Es(n — 1) + es(neg(n|n — 1) (188) 


We are now ready to solve the problem of mutual dependence between g(n) 
and B(n). For this refer to (161) and (162). Substituting (184) in (162) we obtain 
gín) = Mín) + Bín — 1)m(n) + gín)eÿ(n|n — 1)m(n) (189) 
Now note that since 
ep(n|n — 1)m(n) = ep(n|n — 1)Šg`(n)es(n) (190) 
1s a scalar, (189) becomes 
g(n) = [M(n) + Bin — 1)m(n)] (1 = eZ(n{n - 1)m(n))-" (191) 


and the problem is solved. 
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6. Angle Parameter Update 


The final relation need to complete the recursion is an update formula for 
y(n). This update is performed in a manner similar to the update of g(n); that is we 
compute the angle parameter related to the data matrix Yo,n(n) using two different 
approaches and then equate them. By the same procedure that lead to (118) we 
define 


y'(n) =1-y7 y (n)Kow(n)z(n) (192) 


Then using (154) for Ko y(n)z(n) and partitioning y, , (n) into 


Yo x(n) = |xE(n),y7 (n)] Ye (193) 
we obtain 
y (n)=1-— > (ນ). ນໃນ) ະພູ g'(n) (194) 
I 


and using (152) and sımplifying we find 
1 (n) = 3(n — 1) — ep(n)SƑ`(n)er(n) (195) 


If we now use (157) for Ko nx(n)x(n) and partition Ve as 


Yon) = [YI (”).xB(n)] Ue (196) 
we obtain 
y(n) =1- |yny_¡(),x5(n)| g2 g"(n) (197) 
I 


y(n) = 3(n) — ep(n)Sp`(n)ep(n) (198) 
but using (183) and (161) we obtain 
y(n) = y (n) 11 — eg(n|n — 1)m(n))” (199) 
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where everything in the right side of (199) is available. 
Since E g' (n) does not appear explicitly in any of the updates we need only 
to update DE'(n). This can be easily done using the matrix inversion lemma. The 


specific procedure 1s outlined below. 


7. Inverse Matrix Update 


Although all the quantities have now been derived that permit the recursion 
to continue, we note that the inverse matrix Lz’, and not the matrix itself occurs in 
the recursions. This is a fairly smal] matrix and could be inverted directly. However it 
is more efficient to also compute the inverses recursively. This is easily accomplished 


using the matrix inversion lemma [Ref. 5, 7]. We have from (175) 
Ue(n) = Ue(n —1) +er(n)ek(n|n — 1) (200) 


or using (183) we have 


v vinl e6 in) (201) 
a) 
The matrix inversion lemma states that if 
A=E+FG"'F’ (202) 
then 
A?=E!-~E'F/G+F'E'F FE" (203) 
For 
AS =) (204) 
E =) (205) 
F = er(n) (206) 
G = y(n-1) (207) 


we obtain 


Sr (n - 1)er(n)eP(n)XƑ (n — 1) 


A -- E ຽດ 


(208) 


The computation of DF'(n) using the matrix lemma amounts to 1.5(M +1)2+4+2.5(M + 
1) multiplications and one division. 
Although the matrix DZ'(n) could be computed in a similar way, the in- 


verse of Xg(n) is not needed for the 2-D filter. 


H. ALGORITHM SUMMARY 
1. Computational Complexity 


The computational cost of the algorithm depends on the shape and size of 
the filter mask. For each mask we must define, as mentioned before. both a forward 
and a backward multichannel signal a number of channels equal to the number of new 
points acquired or dropped off by the mask. Call this number Ay. For the quarter 
plane filter we used A, = Af +1. Now further define ‘2 as the number of coefficients 
in the 2-D filter mask (for the case used in the derivation A, = (Af +1)(.V +1) —-1). 
The computational cost of the algorithm depends only on these two numbers. Note 
that the use of the permutation matrices only changes the ordering of the elements in 
the matrices affected by them, allowing the procedure to be used for any shape and 
size of filter. The permutation can be obtained without any multiplications, hence 
it will not be considered in this analysis. We also mentioned before the use of a 
forgetting factor A in the cost function to handle nonstationary signals. The effect of 
this constant in the final algorithm shows up only in the computation of the inverse 
error covariance matrix for the forward multichannel filter: 

Up (n — ler(njep(n)Ep (n — 1) 
Ay(n — 1) + e#(n))Ƒ`(n — 1)er(n) 


This increases the computational cost by K1 multiplications. A detailed count of the 


Du) Ve (SP0 = 1) - (209) 


number of operations required by the algorithm is given in subsection 3 below. The 
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method used for the 1-D RLS can be applied to 2-D signals by forming the K: x N; 
2-D deterministic correlation matrix and the A, x 1 vector of deterministic cross- 
correlation terms between the desired filter response and the filter inputs. This form 
of 2-D RLS algorithm requires 1.5h3 +4.5K, operations per iteration which increases 


quadratically with hs. 
2. Initial Conditions 


The recursive implementation of the algorithm requires some initialization 
for the variables used. The assumption that the signal is zero before iteration n = 0 
is reasonable and suggests that all the filter parameters including those of the gain 
filter. should all be set to zero. This choice of initial conditions implies that the 
angle parameter 3(0) must be set to 1.0 since all of the subspaces associated with 
previous data are the null space. However, a positive forward prediction error energy 


is necessary for the algorithm to start. The initial conditions used are therefore 


aU EO (210) 
EO O (211) 
BOS = 20 (212) 
g(0) = O (213) 
O S (214) 
Sr = Tune (215) 

6 = small positive constant (216) 


3. Iteration at time n and Required Arithmetic Operations 


The terms to be computed at each iteration, and their formulas are given 


below. 


— A priori 2-D prediction error (A, operations) 
cq(n|n — 1) = y (n) — y; yinja(n — 1) (217) 
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- 2-D filter update (/Y2 operations) 
a(n) = a(n — 1) + g(n — 1)er(nin — 1) (218) 
— A priori multichannel forward prediction error (A, 2 operations) 
er(n|n — 1) = xp(n) — F7(n — Pyare) (219) 
— Multichannel forward prediction error (Xj; operations) 
er(n) = er(njn — 1)y(n — 1) (220) 


— Inverse error covariance matrix for the multichannel forward filter (1.5? + 


2.5h, operations) 


Sr (n - Der(n)er(n)Er (n-1) 


Ep (n) = Ep (n — 1) - AS dos AS 221 
ISO im =1) + eE(n)SE Tín — 1)er(n) = 
— Multichannel forward filter update (A, A, operations) 
Roo e e T(n|n — 1) (222) 
- Extended gain transversal filter (K? + K, K3 operations) 
M n O ! I 4! ! 
TY“. ແເ... 4 .ໃ22 
m(n) g(n= 1) —F(n) 
- Extended angle parameter (K, operations using previous results) 
1(n) = Tín — 1) — ep(n) 27" (njer(n) (224) 


— A priori multichannel backward prediction error (A; 2 operations) 


es(nin — 1) = x5(n) - BY (n — a (e (225) 


- Angle parameter (A + 1 operations) 


3(n) = 3 (n)[1 — eb(n|n — 1)m(n)] (226) 
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— Gain transversal filter (A, A» + A» operations using previous results) 
g(n) = [M(n) + B(n — 1)m(n)] (1 — eS(n|n — 1)m(n))” (227) 
- Multichannel backward filter update (A, A» operations) 
B(n) = B(n — 1) + g(n)ep(n|n — 1) (228) 


The total number of operations (multiplications or divisions) required per iteration 


by the algorithm 1s 


25M +6NIW; +4.5N¡+3M¿ +] (229) 
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IV. EXTENSIONS, APPLICATIONS AND RESULTS 


The 2-D FRLS algorithm was developed in the previous chapter for a 2-D pre- 
diction filter whose observed signal was the same as the sequence under the filter 
mask. In this particular case the coefficients of a(n) for the 2-D filter, are identical to 
the coefficients of the first column f;(n) of the multichannel filter F(n). To see this, 
note that by definition, the error energy for the multichannel forward prediction filter 
(87) can be rewritten as the summation of the error covariance associated with each 
of the (M+1) channels of the forward prediction filter, where the first term is a(n). 


H 


i.e., the sum of squared errors for the first channel (44). We can rewrite (87) as 
fr[Yr(n)] = tr[Ep(n)Er(n)] = a(n) + Tr(n) (230) 


where Y p(n) is independent of the coefficients in f,(n). The 2-D filter coefficients and 
the coefficients in the first column of the forward multichannel filter are the result of 
minimizing the same cost function and are thus identical. 

We will now consider the case when the data sequence under the prediction 
mask is distinct from the observed sequence (general FIR Wiener filter) and also the 
case when the filter mask covers not only observation data, but also other input data 
sequence (ARMA model). Following that we will present the results of computer 
simulations to illustrate the applications of the adaptive algorithms for 2-D signal 


processing. 


A. GENERAL FIR WIENER FILTER 


The extension of the 2-D FRLS to the general FIR Wiener filtering problem is 
straightforward to obtain by following the same concepts presented in chapter 3. To 
be specific we again consider a first quadrant (N +1) x (Af +1) filter here. However, 


the procedure applies more generally to nonsymmetric half plane and other filters as 
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discussed in section H of chapter 3. The multichannel notation developed in chapter 
3 is used here to define two K x L 2-D sequences d;(n) and y(n), where di(n) is the 
sequence we want to estimate based upon the input sequence y(n). Our goal is to 


find a prediction filter of the form 


a 


din) < y7 „(n)a(n) (231) 


with y, y defined as in (39) and a(n) defined as in (41) that minimizes the sum of 


squared errors 
a(n) = Pla (or (232) 
t=0 
where the prediction error e(n) is given by 
eiln) = dı(n) = dı(n) (239) 
This can be written in vector notation as 
a(n) = e (n)es(n) (234) 
where 
e(n) = di(n)-di(n) (235) 


with d,(n) defined as (n + 1)-dimensional vector that contains the observation data 


from the origin up to point n. Then the estimate d,(n) is given by 
d,(n)= Yın(n)a(n) (236) 


where Y, n(n) is the same data matrix as in (47) and (48). The least squares solution 


for a(n) is once more given by the pseudo-inverse 
-] 
an) = (Yırla)Yın{n)) Yınla)dı(n) (237) 


and the estimate of d,(n) and the prediction error can also be expressed in terms of 


the projection matrices defined in chapter 3. The estimate d,(n) is the projection of 
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d,(n) onto the subspace spanned by the input data 

dı(n) = Pır(n)dı(n) (238) 
The error e,(n) is orthogonal to the estimate d,(n) and is given by 

e(n) = Pin(n)di(n) (239) 
The operator Kı, n(n) defined in (55) can be used to rewrite a(n) as 


a(n) = Kin(n)y,(n) (240) 


We already know how to obtain Ki n(n) from Ki; n(n — 1) in a efficient way, 
thus we are able update a(m — 1) to a(n) as soon as di(n) is available. The complete 
algorithm is the same as the one summarized in section H of chapter 3 with y,(n) 


replaced by d,(n) in (217). 
B. ARMA MODEL 


The ARMA version of the 2-D FRLS can be viewed as follows. Let us call 
the output or observed data yı(n) and the input data w,(n). For the present let 
us assume that this latter sequence is also known or observed. Let us separate the 
coefficients that operate on the two different sequences and call a(n) the vector of 
AR coefficients of the filter, and b(n) the vector of MA coefficients of the filter. As 
before, we develop this extension of the 2-D FRLS to ARMA models assuming a first 
quadrant (N +1) x(Af+1) quarter plane mask for both the AR and MA components 
of the filter, noting that more general forms are possible. Using the scanning index n 


defined before, we proceed by defining an ARMA prediction filter of the form 


x(n) = y7 (na(n) + w? y(n) b(n) (241) 


with y, „(n) and w,,„(n) defined using the same concepts as in (39). We want to find 


a(n) and b(n), the filter coefficients defined respectively for each mask, to minimize 
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the sum of squared errors 
n 
a(n) = Deli) 
1=0 
where the prediction error is given by 


ˆ 


ej(n) = y (n) — yi(n) 


This can be written in vector notation as 


with 

en) = y,(n)-y,(n) 
We can combine the AR and MA coefficients in one single vector c(n) as 

ein) = [af(n),b (n)Jƒ 
The data under the mask can then be expressed as 

ຂາກ) = [y] p(n) wi aln) 
Now we have for the estimate of y, (n) 
ý (n) = Zi y(n)e(n) 

where Zi, n(n) is the data matrix 


Zin(n) = [Yin(n), Win(n)] 


(249) 


formed by Yı n(n) and Wi n(n), the data matrices associated respectively with the 


AR and the MA masks with the same structure as (47) and (48). The least squares 


solution for c(n) is given by the pseudo-inverse of Z; n(n) 


ein) = (Z()Z nn) Ziv(n)y(n) 


(250) 


After defining new projection matrices and transversal filter operators associated with 


the new data matrices, the algorithm to recursively update c(n) closely follows the 


procedures developed in chapter 3. 
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C. ARMA MODELING WITH UNKNOWN INPUT 


In the previous section on 2-D ARMA modeling, it was assumed that the se- 
quence w,(n) was known and available. This is an ideal situation which could some- 


times exist for example in a system identification problem (see Figure 7). Given both 


UNKNOWN | y(n) 
SYSTEM 








ADAPTIVE 
TF 


Figure 7. Modeling with Known Input 


the input sequence w;(n), and the output sequence y(n) and an assumed linearity 
and order for the model, we have all the information necessary to identify the unknown 
system under analysis. knowledge of the input sequence is not always available, how- 
ever this is the case, for example, in the problem of estimating the parameters of an 
ARMA model where w;(n) is a 2-D white noise sequence. The ARMA parameter 
estimation problem for unknown input was addressed using recursive algorithms for 
1-D signals by embedding the AR and MA estimation in a 2-Channel AR modeling 
problem [Ref. 2]. The difficulties with the procedures suggested are similar here; the 


2-D white noise process w,(n) is not known and needs to be estimated from the data. 


91 


Assume that at some moment n all the data under the MA mask is known 
except the most recent sample of the noise sequence w;(n). Further assume that the 
ARMA filter coefficients estimated so far are fairly close to the actual coefficients 
that characterize the system. The natural choice for an estimate for the unknown 
noise sample is the error ey(n), 1.e., we expect the error to be zero if the noise sample 


was known. This procedure is known as ”bootstrapping” [Ref. 2] (see Figure 8). It 


wn) (n) 
| UNKNOWN | Yn 
| SYSTEM 


ADAPTÍVE | N 
ITR 


c4 
a 





Figure 8. Modeling with Unknown Input 


involves two steps. First an estimate y;(n) is obtained assuming w,(n) = 0 and using 
the old parameter estimates. Secondly w,(n) is set equal to e;(n) and we proceed as 
in the case of a known input sequence. This method is highly nonlinear and hence 


very difficult to analyze. However it was found to give reasonable results in practice. 


D. SIMULATION RESULTS 


The 2-D FRLS algorithm was tested both on computer-generated data and on 
digitized images. For a baseline reference the 2-D LMS algorithm was also imple- 
mented. The synthetic data for the system identification and parameter estimation 


results was obtained by driving different 2-D transfer functions with computer gen- 
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erated white Gaussian noise. Most of the tests were performed on 32 x 32 point 
2-D data sequences. Image coding was performed for 256 x 256 pixel images using 
the VICOM system for display purposes and a VAX-785 computer for the algorithm 


implementation. The algorithms were coded in Fortran. 


1. 2-D System Identification 


The first computer simulation in system identification was performed using 
a (2x2) MA model. A computer-generated white Gaussian noise sequence was applied 
both to a filter with known coefficients and to the adaptive filter in the manner of 
Figure 7. The error between the output of the two filters was used to adjust the 


coefficients of the adaptive filter. The MA filter had the form 


d(my,n2) = b0.0)y(ni, no) + 6(0, Dy(ni,no — 1) Coal) 


+ 0(1.0)ບູ(ໃ1 — 1,n¿) + Ù(1.1)0(mị — 1,nạ — ]) 


with 
b(0.0) = 1.0 (252) 
b(0.1) = 0.6 (253) 
b(1,0) = -0.3 (254) 
b(1,1) = 0.3 (255) 


The rate of convergence is shown in Figure 9. As can be seen, each of the coefficients 
converged very rapidly to the actual value. The 2-D LMS algorithm was also imple- 
mented for this case, but as can be seen in Figure 10 the convergence rate is very 
slow. 

The next simulation was performed using an ARMA model where both 
the AR and the MA masks were first order nonsymmetrical half plane filters. A 
computer-generated white Gaussian noise sequence was applied both to a filter with 


known coefficients and to the adaptive filter. The error between the output of the 
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Figure 9. System Identification MA Model 


two filters was used to adjust the coefficients of the adaptive filter. The ARMA filter 


had the form 


y(ni,n3) a(1,0)y(n; — 1,n2) + a(—1,1)y(n; + 1,n2— 1) (256) 
+ a(0,1)y(n,,n2— 1) + a(1,1)y(n; — 1,n2 — 1) 
+  b(1,0)w(n, — 1,n2) + b(—1, 1)w(n; + 1,n2 — 1) 


+  b(0,1)w(n,,n2 — 1) +b(1, 1)w(n; — 1,n2-—1) 


with 


a0 ວດ (257) 
a(—1,1) = —0.l (258) 
a(0,1) = 0.4 (259) 
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Figure 10. System Identification MA Model (2-D LMS) 


a(1,1) = —0.5 
b(1,0) = —0.2 
A) 01 
b(0,1) = 0.8 
b(1,1) = —0.5 


tterations to the true value. 


The rate of convergence is shown in Figure 11 and Figure 12 for both the AR and the 


MA coefficients. As can be seen there, each of the coefficients converged in about 80 


To test the behavior of the algorithm with non-stationary data the same 


dO 


model was run with data obtained by changing some of the coefficients at iteration 
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Figure 11. System Identification ARMA Model (AR coeff. stationary 
data) 


120 to 
at YO) ean OEE (265) 
a(—1,1) = 0.6 (266) 
b(1,0) = 0.4 (267) 
01, . ເມຕ (268) 


As shown ın Figure 13 and Figure 14, the AR and the MA coefficients converged to 
the true initial values, but at iteration 120 when some of the coefficients were changed 
the algorithm started slowly tracking the new coefficients. Since no forgetting factor 
was used here, the algorithm does not forget the initial data and the convergence is 


very slow. The estimated coefficients remain biased. By using a forgetting factor 
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Figure 12. System Identification ARMA Model (MA coeff. stationary 
data) 

of À = 0.95 we obtained the results shown in Figure 15 and Figure 16. With the 
introduction of this forgetting factor the filter coefficients were able to lock on the 


new coefficients in about 150 more iterations. 


2. 2-D Parameter Estimation 


The first computer simulation in parameter estimation was performed us- 
ing a first order nonsymmetric half plane AR model. A computer- generated white 
Gaussian noise sequence was applied to a 2-D AR filter with known coefficients to 
obtain the data. The adaptive filter has access only to the AR (output) sequence as 


shown in Figure 8. The error between the output of the two filters was used to adjust 


of 
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Figure 13. System Identification ARMA Model (AR coeff. nonstationary 
data) A =Y 


the cocflicients of the adaptive filter. The AR filter had the form 


y(n,,n2) = a(1,0)y(m, — 1,2) + a(—1,1)y(mi + 1,22 -1) (269) 


+ a(0,1)y(n,,n2 — 1) + a(1,1)y(n1 — 1,n2 — 1) + (mạ, n2) 


with 


a(1,0) = 0.8 (270) 
a(—1,1) = —0.1 (271) 
a(0,1) = 0.4 (272) 
a(1,1) = -05 (273) 
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Figure 14. System Identification ARMA Model (MA coeff. nonstationary 
Gata) A = 1.0 

The rate of convergence is shown in Figure 17 and as can be seen each of the coefh- 
cients converged to values close to the true values. Since the algorithm does not know 
the input sequence and has to estimate it, there is a slight variation of the estimated 
coefficient around the true values. Here again the 2-D LMS algorithm was imple- 
mented, but as can be seen from Figure 18 some of the coefficients did not converge 
even after 900 iterations. 

Next the modeling procedure with unknown input was tested for the ARMA 
case. A first order nonsymunetric half plane mask was used for both the AR and MA 
coefficients. A computer-generated white Gaussian noise sequence was applied to a 
filter with known coefficients to obtain the ARMA data. The adaptive filter does not 


have access to the driving sequence. The error between the output of the two filters 
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Figure 15. System Identification ARMA Model (AR coeff. nonstationary 
data) À = 0.95 


was used to adjust the coefficients of the adaptive filter. The ARMA filter had the 


form 
y(ny,n2) = a(1,0)y(m, — 1,n2) + a(—1,1)y(m + 1,22 — 1) (274) 
+ a(0, 1)y(n,,n2 — 1) + a(1, 1)y(n, — 1,n2 — 1) 
+ w(ni na) + b(1,0)w(n, — 1,n3) + b(—1, 1)w(n, + 1,n, — 1) 
+ d(0,1)rwo(n;,n2 — 1) + 5(1, 1)w(n, — 1,n2 — 1) 
with 


Q 
~ 
— 
O 
ar 
[| 
oO 
00 


(275) 


abe en (276) 


a EI TIRADO 
ARMA TH ita) CHRP ICIS! ig 
NON-STATIONARY CATA LAMBOA 


A 


. > 


SU 


ເ) 








MAGNTTUDE 





2 20.0 186.0 273.0 220.0 69.0 540.0 630.0 720.0 810.0 900.0 
ITERATION 


Figure 16. System Identification ARMA Model (MA coeff. nonstationary 
data) À = 0.95 


a(0,1) = 04 (277) 
a(1,1) = —0.5 (278) 
b(1,0) = —0.2 (279) 
b(-1,1) = 0.1 (280) 
b(0,1) = 0.8 (281) 
b(1,1) = -05 (282) 


The rate of convergence is shown in Figure 19 and Figure 20 and as can be seen each 
of the AR coefficients converged to values close to the true values. This is similar to 
what happened for the AR parameter estimation problem. The MA coefficients also 


converged to values acceptably close to the actual MA values, but they show some 
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Figure 17. Parameter Estimation AR Model 


bias. As in the previous case the algorithm does not have knowledge of the input 
sequence, hence estimates it using the bootstrapping scheme. Although this method 15 


difficult to analyze due to its non-linear nature, the results obtained are encouraging. 
3. Image Coding 


An image coding problem was also used to test the adaptive algorithm. 
Two black and white images, Figure 21 and Figure 22, with 256 x 256 pixels were the 
2-D sequences used to perform AR and ARMA parameter estimation as described 
above. The 2-D LMS algorithm was also applied to the images to estimate the AR 
parameters. These parameters were then used to form the linear predictor used in 
the coding and decoding scheme shown in Figure 23. A two level quantizer with the 


step size value taken from the Max table was used, assuming that the error sequence 
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Figure 18. Parameter Estimation AR Model (2-D LMS) 


obtained had a Gaussian distribution [Ref. 12). This resulted in a quantized error 
sequence corresponding to one bit per pixel. The image reconstruction was performed 
by driving the inverse filter with the quantized error sequence. 

The first image was reconstructed after being encoded using the three 
different linear predictors and the results are shown in Figures 24, 25 and 26. The 
error images between the original image and the reconstructed images are shown in 
Figures 27, 28 and 29. One of the most widely used measures for the performance 
of a predictive coder is the signal-to-noise ratio (SNR). For a K x L 2-D sequence 
y(n, 2), it can be defined as follows: 

K-] L-] 


*(n1, 72) 
oN Tie, = 10do 2ny=0 2n9=0 3 Mi M2) 283 
ຫ SS DS (mano) (283) 
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Figure 19. Parameter Estimation ARMA Model (AR coeff.) 


The subjective quality of the reconstructed images agrees with the signal- 


to-noise ratio (SNR) obtained for each case. 


2D LMS (AR) SNR = 15.96dB (284) 
2-D FRLS(AR) SNR = 17.24dB (285) 
2-D FRLS (ARMA) SNR = 18.29dB (286) 


The better performance of the 2-D FRLS when compared with the 2-D LMS is ap- 
parent in the results. The improvements obtained for this image with the ARMA 
model imply that the model was able to fit the image better than the AR model and, 
thus produce an error sequence that was more nearly white. 

The algorithm was also tested on the second image (Figure 22). The results 
are shown in Figures 30, 31 and 32. The error images between the original image 


and the reconstructed images are shown in Figures 33, 34 and 35. The subjective 
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Figure 20. Parameter Estimation ARMA Model (MA coeff.) 


quality of the reconstructed images once again agrees with the signal-to-noise ratio 
(SNR) obtained for each case, but the quality difference between the reconstructed 


images obtained using different methods is smaller. 


2-D LMS (AR) SNR = 17.25dB (287) 
2-D FRLS (AR) SNR = 18.16dB (288) 
2-D FRLS (ARMA) SNR = 18.62dB (289) 


In particular, the improvements obtained with the ARMA model for this case are 
not as large. This is probably because the second image has a large number of sharp 


edges, and these are quite difficult to model with any finite order linear model. 
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Figure 22. Image 2 Original 
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Figure 23. Predictive coding. (taken from [Ref. 6]) 
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Figure 24. Image 1 Reconstructed 2-D LMS (AR) 





Figure 25. Image 1 Reconstructed 2-D FRLS (AR) 
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Figure 26. Image 1 Reconstructed 2-D FRLS (ARMA) 
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Figure 27. Image 1 Error 2-D LMS (AR) 





Figure 28. Image 1 Error 2-D FRLS (AR) 
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Figure 29. Image 1 Error 2-D FRLS (ARMA) 
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Figure 30. Image 2 Reconstructed 2-D LMS (AR) 





Figure 32. Image 2 Reconstructed 2-D FRLS (ARMA) 
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Figure 33. Image 2 Error 2-D LMS (AR) 





Figure 34. Image 2 Error 2-D FRLS (AR) 
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Figure 35. Image 2 Error 2-D FRLS (ARMA) 
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V. CONCLUSIONS AND RECOMMENDATIONS 


A Two-Dimensional Fast Recursive Least Squares (2-D FRLS) algorithm was 
developed using a geometric formulation. The derivation is based on the relation 
between least squares prediction and the concepts of orthogonality associated with 
vector spaces. The ordering necessary to develop the recursive algorithm was imposed 
on the data by using a linear scanning index. 

A substantial reduction in computational cost is obtained when compared with 
the basic 2-D RLS algorithm. The 2-D FRLS algorithm requires on the order of 
DK Ky arithmetic operations per iteration compared with 1.54? for the basic RLS, 
where A, is the number of channels defined for the 2-D FRLS algorithm and Kk is 
the total number of coefficients in the 2-D filter. The 2-D LMS algorithm, due to 
its simplicity, is still more economical than our algorithm in terms of computational 
cost, but lacks the excellent convergence performance experienced for the 2-D FRLS. 

The work described here could be extended in several different ways. First a 
thorough investigation of the behavior of the algorithm when using finite word length 
implementation as well as different forgetting factors could be developed. Secondly, 
techniques used for the 1-D fast RLS to obtain further reductions of the computa- 
tional cost could be investigated. In particular a variant called the gain normalized 
Fast Transversal Filter [Ref. 6] seems to be applicable to the 2-D FRLS. Its derivation 
however does not follow directly from the geometrical approach presented here. Fi- 
nally, the algorithm could be tested in other areas including 2-D parametric spectral 


estimation. 
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APPENDIX PROJECTION MATRIX UPDATE 


This appendix derives the results 








Puln-l) O 
Pu,(n) = | ) (290) 
0 1 
and 
Ku(n -1 0 
Ku;(n) = | (291) 
-u’Kufn —1) 1 
used in Chapter 3. 
Begin by noting that the data matrix [U(n), z(n)] can be partitioned as 
0 
0 
U(n-1) 0 
[U(n),x(n))= | U(n) = (292) 
ມ 1 





ar 


Puz(n) = [U(n),2(n)] ((U(n),2(n)]? [U(n), 2(2))) [U(n).z(m)JÏ— (293) 


1 


= n= uu’ u 
Ul (n — 1)U( 1)+uu Ñ (294) 


T 1 





ú 
Here we will use the relation for the inverse of a symmetrical matrix by partitioning 


el 131. If: 








with C = B! and A and D square matrices, then 


A`! + PQPI POr 


M7 = E (296) 
-QP Q> 
with 
P=A!B 
and 
Q=D-CP 


For our case we have 


A = U’n-1)Un-1)+uu’ (297) 
B = u (298) 
C = uf (299) 
D = 1 (300) 


The matrix inversion lemma [Ref. 5, 7} is used first to obtain A”? as follows. 


The matrix inversion lemma states that if 


A = E+FG"F (301) 
then 
A! = EU-E PF [G+F ETE] FE? (302) 
Now taking 
E = |Uf(»—1)U(n - ])| (303) 
Cam (305) 
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we have 


=} 
A! L Pa. Lo Et Es 


K=scalar 





bil uu! E! 
1+ 4 


E! — 


where K = u! E"lu. The upper left partition of (296) becomes 


A`? we PQ! P? = A`? a: Awe 1 z Au) N! 


. pao E uu E! 
u l+k 
EL uh 
= ດ E-! uu! E! E-! uu! E! 


E 
= ET = [UT(n — 1JU(n-1)]" 


To find the other partitions we write 





—] K 

5 > po e 

1+ 1 

= 
ES | (UT(n — 1)U(n — 1)) u 
SRS l+ K 
and 
UE u 1 


ະ En Er 
Q ` I+h I+ñh 


thus we have 
-PQ = (U’(n-1Un-1)) u 


Now substituting (311), (313) and (314) in (296) we obtain 


—1 


ເມ — 1)U(n — 1)) 
y? (UT (n — 1)U(n — ie h +41 


I = 


=] 


~] 


Ku* E”! 


1+ K 


— (UT(n-1)U(n-1)) "a 


(308) 


(309) 


(310) 


(311) 


(313) 


(314) 


(315) 


Postmultiplying this result by the transpose of (292) we obtain 





Ky(n — 1) 0 
Ku;(n) = (316) 
—u?Ky(n—1) 1 
Finally, premultiplying by (292), (293) becomes 
Pu(n-l) O 
Bee \ EKO 
02 | 





Q.E.D. 
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Adaptive two dimensional RLS algorithms. 
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